Application of improved grey wolf model in collaborative trajectory optimization of unmanned aerial vehicle swarm

With the development of science and technology and economy, UAV is used more and more widely. However, the existing UAV trajectory planning methods have the limitations of high cost and low intelligence. In view of this, grey Wolf algorithm is being used to achieve collaborative trajectory optimization of UAV groups. However, it is found that the Grey Wolf optimization algorithm (GWO) has the problem of weak cooperation. In this study, based on the traditional GWO pheromone factor is introduced to improve it.. Aiming at the problem of unstable performance of swarm intelligence optimization algorithm under dynamic threat, deep reinforcement learning is used to optimize the model. An unmanned aerial vehicle swarm trajectory planning model was constructed based on the improved grey wolf algorithm. Through experimental analysis, the optimal fitness value of the improved grey wolf algorithm was lower than 0.43 of the grey wolf algorithm. Compared with other algorithms, the fitness value of this algorithm is significantly reduced and the stability is higher. In complex scenarios, the improved grey wolf algorithm had a trajectory length of 70.51 km and a planning time of 5.92 s, which was clearly superior to other algorithms. The path length planned by the research and design model was 58.476 km, which was significantly smaller than the other three models. The planning time was 5.33 s and the number of path extension points was 46. The indicator values of the Unmanned Aerial Vehicle swarm trajectory planning model designed by this research were all smaller than the other three models. By analyzing the results, the model can achieve low-cost trajectory optimization, providing more reasonable technical support for unmanned aerial vehicle mission execution.


Proposal of unmanned aerial vehicle trajectory planning issues
The fixed wing UAV is considered in the study, which is known for its high speed, long range, and large load capacity.It is suitable for long-range reconnaissance, target tracking, and intelligence gathering.Dupin turn is a complex flight maneuver, which is usually used for fast turning of fixed wing UAV in limited space.It requires UAV to have high maneuverability and flight control system to achieve accurate trajectory control.The use of Durbin gyrates in fixed-wing UAV does add operational complexity, but it also allows the UAV to be more flexible in response to a variety of mission requirements.The representation method of UAV flight trajectory is often related to actual flight tasks and problem requirements [16][17][18] .For this, the study uses trajectory points as three-dimensional (3D) representations.Because UAV needs to further ensure the success rate of planning when facing new threat information, the final optimal trajectory is presented through a smoother curve.The specific trajectory is represented by Eq. (1).
In Eq. (1), A is the starting position of the flight.P 1 and P 2 are both trajectory points in the planning process.B is the target location.In the UAV trajectory planning, there are four main aspects to consider: environmental information, planning methods, trajectory, and flight constraints.This can realistically reflect the actual flight situation of UAV and meet its feasibility requirements, thus making the planned trajectory realistic and reliable.The maximum range, flight altitude threshold, flight speed, turning angle, and minimum turning radius are chosen as constraints.The flight area of UAV is a relatively vast 3D space.The UAV's own motion constraints and spatial threat information are combined to set the position coordinates of UAV in the 3D flight space.Common spatial planning expression methods include discrete grid method and continuous space Voroni diagram, visual graph, continuous method, etc.The connecting line method is a planning method that optimizes the starting point segment by segment, with advantages such as simple implementation and short solving time.For ease of description, the study uses the line method for 3D spatial planning.The 3D planning space is represented by Eq. ( 2).
In Eq. ( 2), x is the longitude of UAV in the space.y is the latitude of UAV in the space.z is the altitude.A schematic diagram of the UAV two-dimensional trajectory is shown in Fig. 1.
The radius of each threat area is represented by R m .The probability of being attacked upon entering the threat area will also increase considering that multiple fixed wing UAV need to cross multiple threat sources with different threat levels and fly to the target point.The radius is used to represent the threat coverage range.The study categorizes the threats encountered in UAV missions into three types: enemy radiation sources, terrain obstacle threats, and flight environment weather.The study adopts numerical quantification to represent threats.For the same threat, the corresponding threat probability will change with the change of the radius of action.The study multiplies the height corresponding to the radius of action by the threat probability to obtain the threat cost.The corresponding data of height are represented by Eq. (3).
In Eq. (3), x 0i , y 0i represents the position of the i-th threat center.T x, y represents the height information corresponding to the threat.T i is the ability of the corresponding threat i to act.x si and y si both represent attenu- ation coefficients.The superposition method of terrain data and threat altitude data is represented by Eq. ( 4).
(1) L = {A, P 1 , P 2 , ..., P n , B} www.nature.com/scientificreports/In Eq. ( 4), Z x, y is the terrain data obtained by superimposing the threat and terrain height.P(R) is the probability of the threat with a distance of R from the i-th threat center.H x, y is the terrain height value before data stacking.UAV needs to perform low altitude flights to reduce the probability of being detected during flight missions.For this purpose, three different threat models are constructed, namely terrain, radar threat, and electronic interference threat models.The mathematical model of terrain threat is represented by Eq. ( 5).
In Eq. ( 5), a, b, d, e, f , g are constants.The terrain undulations are simulated by changing these constants.In general, the first consideration for UAV is radar detection when conducting flight missions in areas with radar hazardous radiation sources.The detection range of radar radiation sources determines the degree of threat to the UAV during mission execution.The relationship between radar detection probability, radar false alarm probability, and maximum radar operating range is represented by Eq. ( 6).
In Eq. ( 6), P fa is the radar false alarm probability.P d is the probability of the radar detection.P 1 and P 2 rep- resent the detection probabilities of the radar at two different distances R 1 and R 2 , respectively.From this, the radar detection rate in Eq. ( 7) can be calculated.
In Eq. ( 7), R represents the radial distance of UAV distance threat.R max is the maximum operating range of the radar.The elevation data model of radar threat is obtained by integrating the radar threat range and detection probability, represented by Eq. ( 8).
In Eq. ( 8), K r is the performance coefficient related to radar.x 0 , y 0 is the coordinate of the radar center position.Electronic interference danger refers to a killing method in the enemy's prevention and control system that interferes with the recorded electromagnetic pulses of flying targets.UAV mainly uses GPS for navigation.But when the navigation signal is interfered with, UAV will lose control.The elevation data of the electronic interference threat model satisfy Eq. ( 9).
In Eq. ( 9), x m , y m , z m is the center point coordinate of the electronic interference source.H m x, y is the elevation data.After overlaying the electronic interference model with the terrain, Eq. ( 10) is obtained.
After constructing three threat models, the study applies them to unmanned trajectory planning as a threat factor that needs to be avoided during UAV mission execution.The 3D effects of radar threat and electronic interference threat are shown in Fig. 2.

Unmanned aerial vehicle trajectory planning based on i-gwo
The trajectory planning of UAV clusters in a 3D environment is a highly challenging and multi-constrained optimization problem.The study uses I-GWO to solve trajectory planning [19][20][21] .GWO has the advantages of simple structure, fewer required adjustment parameters, and good balance, so it is widely used in various fields.Wolf packs generally engage in collective hunting, and the social hierarchy of gray wolves plays an important role 22,23 .Under the leader-ship of the head wolf, stronger gray wolves assist in decision-making, while old and weak gray wolves accept the leader-ship and protection of other gray wolves.The hunting process of gray wolves is divided into three steps: chasing and tracking prey, surrounding and chasing prey, and attacking prey.The specific GWO is shown in Fig. 3.
The behavior of gray wolves chasing prey during hunting is defined as Eq.(11).
In Eq. (11), D is the distance between the individual gray wolf and its prey.C and A are vector coefficients.X (t) and X P(t) are vectors for the position of the gray wolf and prey, respectively.t is the number of iterations.
is the equation for updating the position of gray wolves.D P(t) is the distance between the prey's position and the center point.These two vector coefficients are represented by Eq. ( 12).
In Eq. ( 12), a is the convergence factor, and its value decreases linearly from 2 to 0 with the number of itera- tions.r 1 and r 2 are random numbers within [0, 1].The mathematical model of grey wolf tracking individuals is represented by Eq. ( 13).
In Eq. ( 13), D ς represents the distance between the gray wolf ς and other wolves.X ς is the current position vector of ς .X (t) is the position of a gray wolf, which will update during the iteration process with the position of the optimal wolf.The attack and search processes of GWO are shown in Fig. 4.
Although GWO has superior performance, the lack of correlation between parameter changes in each iteration of the algorithm results in low convergence and easy trapping in local optima [24][25][26] .Therefore, inspired by the pheromone of Ant Colony (AC) algorithm, this study introduces pheromone factors into GWO to enhance cooperation between wolves 27 .In addition, pheromones are optimized based on the hierarchical system of wolf packs and then used to guide subsequent iterative operations.The increment of pheromones is represented by Eq. ( 14). ( 12) In Eq. ( 14), Q P represents the influence of new pheromones left by the gray wolf.P is the wolf pack level.α, β, δ represent the first, second, and third social levels of wolf packs.d i,i+1 is the length from paths i to i + 1 .�τ i,i+1 (t) is the pheromone increment of the path points i to i + 1 passed by the three levels of wolves at t-th iteration.In each iteration, the previously existing pheromones will decay, while the pheromones of the new optimal wolf will be superimposed.Therefore, the update of pheromone concentration is represented by Eq. ( 15).
In Eq. ( 15), τ i,i+1 (t) represents the concentration of pheromones.ρ is the attenuation factor of pheromones.Pheromone attenuation factors determine the rate at which pheromones decrease over time.The design of different attenuation factors will play different roles in the scene.In practical applications, the choice of attenuation factor needs to be weighed according to the characteristics of the specific problem.A smaller attenuation factor can be selected for scenarios that require fast convergence.A larger attenuation factor can be selected for scenarios that require higher diversity.In addition, it should consider using different attenuation factors at different stages of the algorithm to maintain diversity while increasing convergence speed.Fast convergence scenarios tend to occur in applications where the time requirements are strict and the problem space is relatively simple.For example, in real-time control systems or online optimization tasks, the algorithm needs to find the best solution in a short time to meet the real-time requirements of the system.Scenarios requiring high diversity usually appear in complex problem space, where there may be multiple local optimal solutions, but the global optimal solutions are difficult to find directly.For example, in areas such as pattern recognition, image processing, or machine learning, algorithms need to process large amounts of data and features, and they need to extract useful information from them for decision-making or classification.In these scenarios, maintaining a high diversity helps the algorithm to explore a broader problem space, avoid falling into local optima prematurely, and thus improve the possibility of finding a global optimal solution.
Only the optimal three wolves are allowed to leave pheromones on the path points they pass through to highlight the guiding role of the local optimal trajectory obtained in each iteration and reduce the computational complexity of the algorithm.At each path point, each wolf may have several feasible alternative path points when generating the next path point.The probability of each feasible path point being selected is represented by Eq. ( 16).
In Eq. ( 16), H(i) is the set of all feasible next path points on the i-th path point.γ and ξ are normal numbers, meaning the preference level of grey wolves for pursuing and selecting the current path, respectively.η i,h i is the heuristic factor.τ i,h i (t) is the concentration of pheromones.I-GWO is shown in Fig. 5.
As mentioned earlier, the study uses the line method to select UAV trajectory points, dividing the starting point and target point into D equal parts.The total cost function of UAV cluster trajectory planning is composed of fuel and threat costs.The threat cost is inversely proportional to the distance between the threat source point and UAV.In summary, the path of UAV is solved through I-GWO, and the least costly waypoint is found during the iteration.( 14) The area where the prey is located ||A → ||>1 The area where the prey is located Since the traditional GWO highly depends on the initial value of parameters compared with other algorithms, the study screened the optimal initial parameter combination through the experimental test method, and finally obtained the parameters of the I-GWO, as shown in Table 1 28 .

Optimization strategy for trajectory planning model based on reinforcement learning
As the UAV navigation environment becomes increasingly complex, sudden threats will also increase, which will lead to the failure of swarm intelligence algorithms for trajectory replanning 29,30 .Therefore, the study utilizes reinforcement learning for trajectory replanning and proposes an improved strategy for actual trajectory planning using Deep Deterministic Policy Gradient Algorithm (DDPG).This can achieve dynamic optimization of trajectory planning models.When UAV is conducting trajectory replanning, it can obtain information on position, relative position, speed, and other aspects through sensors and intelligence.Therefore, combining these three aspects of information can represent the state information at any time during UAV trajectory replanning, represented by Eq. ( 17).
In Eq. ( 17), x u,t , y u,t , h u,t represents the coordinate position of UAV in flight space at time t .dx t , dy t , dh t is the relative position between UAV and the target.v x,t v y,t v z,t represents the sub-velocity of UAV in three direc- tions.The action of UAV is represented by Eq. ( 18).In Eq. ( 18), ϕ t is the direction angle of UAV.θ t is the pitch angle of UAV.The UAV motion used in the study adopts a constant acceleration model for motion.To improve the convergence of trajectory replanning, nonsparse rewards are set up.When facing dynamic threats in UAV, it is necessary to reach the target location as quickly as possible while avoiding threats such as radar detection and electronic interference.The selected threat models are all related to the distance of UAV.Therefore, the study considers real-time distance as a reward factor for UAV trajectory replanning.After selecting the distance factor as a reward, the study subdivides it into four types: negative rewards for voyage, boundary, and threat, and positive reward for arrival.The negative reward for the voyage is represented by Eq. ( 19). ( 17)

Population size 50
Pheromone attenuation factor 0.25 Weight coefficients 1 , 2 , and 3 of the cost function 10 -2 , 10 www.nature.com/scientificreports/In Eq. ( 19), N(•) is the normalization formula.d is the flight range that has been flown by UAV.l max is the maximum flight range corresponding to UAV carrying fuel.When reaching the target point, the system will provide feedback on a positive reward, represented by Eq. (20).
In Eq. ( 20), P u,t represents the location where task UAV is executed.P t,t is the location of the target point.φ max is the maximum detection distance of UAV.Similarly, boundary and threat negative rewards are set, and the final reward is obtained by multiplying the weights of each reward and adding them together.The improved trajectory planning model is shown in Fig. 6.
The study combines value functions with policy gradients and adopts DDPG for model dynamic optimization to better face dynamic and sudden threats.This algorithm is divided into two parts: policy network and value function network, which evaluate the actions generated by the policy network using the value function network.The action selection is optimized by generating error correction, represented by Eq. ( 21).
In Eq. ( 21), θ is the direction angle.∇J θ (µ θ ) is the gradient of the policy parameter corresponding to the objective function.σ is a gradient.µ θ is a deterministic strategy.π µ θ ,σ is a random strategy.The updates of the value function and policy networks are represented by Eq. ( 22).
In Eq. ( 22), µ(s|θ µ ) and Q s, a|θ Q represent the value function and policy networks, respectively.θ µ and θ Q are parameters of two networks, respectively.s is the UAV status information.N is a real number based on the environment.L ′ is the updated value of the policy network.The algorithm framework is shown in Fig. 7.
The hyper-parameter adjustment in this algorithm will directly affect the convergence effect of the final model, and there is a high sensitivity issue.Therefore, the study combines it with neural networks to optimize their grid parameters and ultimately achieve an improvement in training accuracy.The reward factors obtained are used as the fitness function, and the weights are optimized using neural networks to improve the fitting effect.With more UAV, the system actions will require more complex and refined planning and coordination.When assigning tasks to each UAV, Particle Swarm Optimization (PSO) algorithm is used to find the optimal task assignment scheme to improve the efficiency and performance of the system.The multi-agent cooperative optimization technology is studied to further improve the precision and cooperative efficiency of UAV flight path planning.Specifically, the technology realizes the overall optimization of the UAV swarm by building a multi-agent system.Meanwhile, each UAV is treated as an independent agent, and the information interaction and collaborative decision-making between the agents is utilized.Figure 8 presents the operation of the system in a 3D form.www.nature.com/scientificreports/

Performance analysis of unmanned aerial vehicle swarm trajectory planning optimization model based on i-gwo
A series of simulation experiments were designed in MATLAB to verify the performance of the proposed trajectory planning model.A comparative experiment was conducted with the currently popular centralized optimization algorithms to verify the improvement effect of GWO and the rationality of using this algorithm for trajectory planning.The scale of the environment has a significant influence on the result of UAV cluster track planning.The larger scale means that the UAV has more space for flight path planning, but it also increases the complexity and difficulty of flight path planning.In a vast environment, UAV needs to take into account more terrain, obstacles, and potential threats.The scale of the environment will also affect the detection and communication range of the UAV.In larger environments, UAV may need to fly longer distances to reach their target points, while maintaining communication with other UAV or command centers becomes more difficult.In three different mountain conditions, various algorithms were used for replanning the trajectory, including the Cuckoo Algorithm (CA), AC algorithm, PSO, GWO, and I-GWO.The comparison results are shown in Fig. 9.In Fig. 9, I-GWO and GWO corresponded to a re-planned trajectory length of 62.54 km and 60.81 km, respectively, with a difference of 1.73 km, indicating an increase in the range of the algorithm after improvement.Compared to these two algorithms, the other three algorithms had a re-planned trajectory length greater than 80 km, which clearly required an additional range of 19.19 km.The range of the CA reached 105.22 km, which did not meet the requirement of the shortest track.The stability simulation analysis was conducted on five algorithms to further explore the performance of I-GWO in Fig. 10.www.nature.com/scientificreports/According to Fig. 10, the maximum difference between fitness values in I-GWO was 5.45, while the traditional GWO was 5.88, with a difference of 0.43 between them.Compared with the other three optimization algorithms, I-GWO had a lower optimal fitness value and a smaller fluctuation range.Therefore, the stability of I-GWO was not significantly different from before improvement, but it had higher stability compared to other algorithms.An increase in the number of threats will make it more difficult to plan UAV flight paths.When there are more threats in the environment, UAV needs to plan their flight paths more finely to avoid those threat areas.The increased threats have an impact on flight performance.The increased threat has a detrimental effect on the flight performance of UAVs, and in order to avoid threat areas, UAVs may need to maneuver and adjust altitude frequently, which increases their power consumption and flight time.In cluster operations, UAV needs to cooperate and cooperate with other UAV to jointly complete combat tasks together.However, when there are a large number of threats in the environment, coordination and cooperation between UAV may become more complex and difficult, and may even lead to communication interruptions and coordination failures.A study was conducted to compare the trajectory planning effects of each algorithm in three different scenarios to test www.nature.com/scientificreports/ the trajectory planning effectiveness of each optimization algorithm.There were 5 threat sources in scenario 1, 7 in scenario 2, and 10 in scenario 3. The comparison results are shown in Table 2.

Value function network
As the number of threat sources increased, the average cost and trajectory length of each algorithm increased.In the simpler scenario 1, average generation value and cost standard deviation obtained by I-GWO and the original GWO were close to those obtained by AC.In more complex scenarios 2 and 3, I-GWO had significantly better values than the other algorithms in all indicators except for the running time.As the complexity of the scene increased, I-GWO had a more significant advantage in trajectory planning.To further test the superiority of I-GWO, the study conducted training iterations on each algorithm in three scenarios and records the training results in Fig. 11.
From Fig. 11, I-GWO achieved the target cost and began to converge after only about 20 iterations.Although the convergence speed of traditional GWO was basically the same as that of the improved algorithm, the convergence accuracy was lower compared to the improved algorithm.In addition, the other three algorithms all converged after about 30 iterations, and the convergence accuracy was basically the same as traditional GWO.Therefore, I-GWO had the high convergence accuracy.In response to sudden dynamic threats, research improved the model using reinforcement learning methods.This study compared the UAV swarm trajectory planning model designed by this research (Model 1) with existing advanced dynamic threat UAV trajectory planning models to test the improvement effect of the model and the collaborative planning effect of the improved model in solving tasks.The comparative models included a UAV online trajectory planning model based on a real-time search prediction algorithm (Model 2), a dynamic trajectory planning model based on an improved constrained differential evolution algorithm (Model 3), and a trajectory planning model based on an improved heuristic search algorithm (Model 4).The trajectory planning results of the four models under single, double, and multiple threat zones are shown in Fig. 12.
In Fig. 12a, under the same conditions, the path lengths calculated by Models 2, 3, and 4 were 75.512 km, 78.348 km, and 80.457 km, respectively.The path length of Model 1 was 58.476 km, which was significantly smaller than other models.In Fig. 12b, when encountering dual burst threats, the planned paths of Models 2, 3, and 4 were 76.412 km, 79.445 km, and 81.025 km, respectively, while the planned path length of Model 1 was 60.512 km.In Fig. 12c, the planned paths for models 2, 3, and 4 were 81.545 km, 86.941 km, and 90.441 km, respectively.The path curve calculated by Model 1 met the smoothness requirements of UAV navigation paths, and the planned path length was significantly smaller than other models.This study compared the path planning length, number of extension points, planning time, and heading changes of the four models in three scenarios to further compare and analyze the performance of the models.The specific results are shown in Fig. 13.
According to Fig. 13, in terms of path planning length, Model 1 calculated the shortest path length.In three scenarios, the average planning length of Model 1 was reduced by 18% compared to Model 2. Compared to Models 3 and 4, the planning length of Model 1 had decreased by 20% and 23%, respectively.In terms of planning time and the number of path extension points, the indicator values of Model 1 were smaller than those of other models.This indicated that Model 1 had better real-time performance compared to other models, and fewer path extension points could ensure better performance of UAV in adjusting flight attitude and speed.In addition, the experiment tested whether these models had extreme situations when sudden threats appeared to test the path heading changes of these four models in three scenarios.The changes in heading are shown in Fig. 14.
According to Fig. 14, the curve of Model 1 changed smoothly and the heading fluctuated between 0.2 and 0.4π.However, Models 2, 3, and 4 all exhibited significant heading changes.Compared to Model 1, the heading change was relatively small and the flight process was relatively stable.This proved that Model 1 could ensure the normal flight of UAV in a real flight environment.After the above comparison, Model 1 has excellent planning performance.Research was conducted on using Model 1 for dynamic planning in simulation software to further visualize and verify the planning effectiveness of Model 1.The planning results are shown in Fig. 15.www.nature.com/scientificreports/According to Fig. 15, Model 1 effectively avoided known static threats and sudden mobile threats while tracking the reference trajectory, and the trajectory was smooth.Therefore, Model 1 had good trajectory planning performance and could help UAV change its trajectory in real-time.
In order to test the performance of the research design method in scenario of large unmanned aerial vehicles (UAVs), and to test the scalability of the research method, an experiment was carried out.The experiment starts with a large drone swarm scenario, consisting of dozens of drones that need to work together to complete a task in a complex dynamic environment.These drones not only need to avoid static obstacles, but also need to deal with sudden mobile threats.The experiment applied Model 1 to this scenario to verify its planning performance in a large-scale drone swarm.The experimental results are shown in Table 3.
From Table 3, with the increase of the number of UAVs, the average planning length, average planning time and average number of path extension points all increase, but the average course change remains relatively stable.This shows that model 1 can still maintain a stable course change in the scenario of large UAVs, so as to ensure the flight stability of UAVs.At the same time, the success rate decreased slightly when the number of drones increased, but the overall level remained high, indicating that model 1 still has a high planning success rate in large-scale scenarios.Based on the above experimental results, it can be concluded that model 1 of the research design performs well in the scenario of large UAVs, and can effectively cope with complex dynamic environment and realize collaborative task planning.At the same time, model 1 has good scalability and can adapt to different scale UAV swarm scenarios.This provides strong support for the future application of large-scale UAVs.

Conclusion
At present, trajectory planning techniques have been widely applied, but the current trajectory planning technology cannot achieve ideal planning results.To this end, the study utilized the pheromone factor I-GWO and combined it with DRL to achieve trajectory optimization of UAV dynamic and static states.Through the experimental analysis, I-GWO and GWO had a re-planned trajectory length of 62.54 km and 60.81 km, respectively, with a difference of 1.73 km.After the improvement, the algorithm's range was increased.Compared to other algorithms, the planned path length of I-GWO was significantly smaller.I-GWO achieved the target cost in about 20 iterations and began to converge.Although the convergence speed of traditional GWO was basically the same as that of the improved algorithm, the convergence accuracy was lower compared to I-GWO.The path length of Model 1 was 58.476 km, which was significantly smaller than other models.As the number of experiments in Model 1 increased, the curve changed smoothly and the heading fluctuates between 0.2 and 0.4π.There was no obvious sudden change in heading.Model 1 could effectively avoid obstacles while tracking the reference trajectory, while achieving low-cost trajectory planning, and the trajectory was relatively smooth.Therefore, the constructed model has excellent performance in UAV trajectory planning and can complete effective trajectory replanning flight tasks in complex environments.This study only conducted small-scale trajectory planning during the UAV flight.However, in practical scenarios, the scale of UAV swarm will be relatively large.Therefore, further exploration of the operational performance of the model on a larger scale is needed in future research.www.nature.com/scientificreports/At present, research only simplifies UAV into particle models to control their pitch and yaw angles.However, there is no more detailed consideration of the specific models of UAV.In future research, more specific physical models can be added to further optimize their trajectory problems.In practical applications, unmanned opportunities are subject to many uncertain factors, such as tornado, rainstorm, and other extreme weather that affect the driving of UAV, sudden changes in tasks, insufficient battery power of UAV and other uncertain factors.In future research, these factors can be further incorporated into the model to improve the robustness of the planning model.
Windy not only affects the flight speed and stability of UAV, but also poses additional challenges to their flight path planning.The study needs to incorporate the wind effect into the consideration of flight path planning to solve this problem.First, the UAV dynamics model needs to be further refined.The wind field model is introduced to simulate the wind effect.Future work can simulate different wind field conditions, including wind speed, wind direction, and spatial distribution of wind field to compare the processing time of different algorithms under windy conditions.The time required for each algorithm can be recorded to complete the track planning.The efficiency of different algorithms when dealing with windy conditions can be evaluated by comparing these time data.

Figure 1 .
Figure 1.Two-dimensional navigation trace diagram of the UAV.

Figure 2 .Figure 3 .
Figure 2. 3D rendering of radar threat and electronic jamming threat.(a) 3D Map of Radar Threat, (b) 3D diagram of electronic interference threat.

Figure 4 .
Figure 4. Attack and search processes of the grey wolf algorithm.(a) Attack, (b) Search.

Figure 7 .
Figure 7.The algorithmic framework for combining value function networks and policy networks.

Figure 8 .
Figure 8. Three-dimensional representation of system operation.

Figure 12 .
Figure 12.Track planning results of four models in different sudden threat regions, (a) Single threat zone, (b) Dual threat zone, (c) Multiple threat zones.

Figure 13 .Figure 14 .Figure 15 .
Figure 13.Comparison of path planning effects of the model.(a) Path planning length, (b) Path extension point, (c) Path planning time.

Table 2 .
Track planning results of each algorithm in three scenarios.

Table 3 .
Experimental Results of Model 1 in Large-scale UAV Swarm Scenarios.